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Abstract: We suggest an iterative approach to computing K-step maximum like- 
lihood estimates (MLE) of the parametric components in semiparametric models 
based on their profile likelihoods. The higher order convergence rate of K-step MLE 
mainly depends on the precision of its initial estimate and the convergence rate of 
the nuisance functional parameter in the semiparametric model. Moreover, we can 
show that the i^-step MLE is as asymptotically efficient as the regular MLE after a 
finite number of iterative steps. Our theory is verified for several specific semipara- 
metric models. Simulation studies are also presented to support these theoretical 
results. 
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1. Introduction 

Let X\, . . . ,X n be independent and identically distributed random vari- 
ables from a semiparametric model P = {Pe-q : £ Q,rj £ Ti}, where 8 is a 
d— dimensional parameter of interest and r\ is an infinite dimensional nuisance 
parameter. A well-known method of estimating the parameter 6 in a semipara- 
metric model is to solve 6 from the below estimation equation: 



J2kvM = o, (i.i) 



i=l 

where fj n is some estimator for the nuisance parameter, and £g„ is the efficient 
score function for 9, whose definition will be introduced later. However, there 
are at least two concerns in solving (jl.lj) . Firstly, we may have multiple roots in 
which identifying the consistent solution could be very challenging. Secondly, the 
above estimation approach requires an explicit form of the efficient score function, 
which in general is implicitly defined as an orthogonal projection. Although 
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we can estimate 9 only by solving Y^^&^i-^i) = 0, where t$ m is the regular 
score function for 9 given the true parameter rjo, in the semiparametric models of 
convex parametrization (page 305 in Bickel, Klaassen, Ritov and Wellner (1998)), 
many other semiparametric models of interest do not possess such nice properties. 

The above concerns can be addressed well by the profile likelihood based 
K-step maximum likelihood estimate proposed in this paper. Under fairly gen- 
eral assumptions the K-step MLE is shown to posses higher order asymptotic 
efficiency than MLE of 9 in semiparametric models. Actually the motivation for 
constructing k-step estimator On comes from the Newton-Raphson algorithm 
for solving (11. ip with respect to 0, starting at the initial guess n °\ Thus, we can 
define k-step estimator iteratively in the below form: 

§W = + (P,^*-!, E k _ 1} ) 1 Pj^ (1.2) 

for k = 1,2,..., P n / = X^r=i f(Xi)/n and n °^ is some preliminary estimator 
for 0. In the parametric models, K-step MLE is defined similarly but with 
the efficient score function replaced by the regular score function for in (jl.2p . 
Under some regularity conditions in parametric models, Jassen, Jureckova and 
Veraverbeke (1985) shows that 

0« - e n = P {n- 1 ) and 0® - 9 n = P (n~ 3 / 2 ), (1.3) 

where 9 n is maximum likelihood estimate for 9. The previous studies (Bickel, 
Klaassen, Ritov and Wellner (1998) and Van der Vaart (1998)) about i^-step 
MLE only focus on the semiparametric models with convex parametrization, in 
which the efficient score functions can be estimated explicitly. Given certain no- 
bias conditions of the estimated efficient score functions, Van der Vaart (1998) 
shows that 9 { n ] = k + opin- 1 / 2 ). Mor eover, i^-step approach is also used in local 
(quasi) likelihood estimation for the purpose of reducing computational cost, see 
Fan and Chen (1999), Fan, Chen and Zhou (2006) and Cai, Fan and Li (2000). 
However, as far as we are aware, it appears that no systematic studies have been 
done on the construction of K-step semiparametric MLE and its higher order 
asymptotic efficiency so far. 

The efficient score function £g n in (jl.2p usually does not have an explicit 



_R"-Step Maximum Likelihood Estimate 



3 



form or cannot be estimated explicitly as discussed above. Hence, we estimate 
Pra^0,7j„ and fnte^J^Q ~ n via numerical derivatives of the profile likelihood. The 
profile likelihood pl n {9) is defined as sup^ GW lik n {9, rj), where lik n (6, rj) is the full 
likelihood given n observations. In practice, the profile likelihood may have an 
explicit form, e.g. the Cox model with right censored data, or can be easily com- 
puted using procedures such as the fixed-point algorithm (as used in Kosorok, 
Lee and Fine (2004), for example) or the iterative convex minorant algorithm 
introduced in Groeneboom (1991) if rj is a monotone function. Hence we will as- 
sume throughout this paper that evaluation of pl n {9) is computationally feasible. 
We shall consider the profile likelihood based -fT-step MLE in the form: 

§W = 6(t 1) + (nM- l \t n ))~ 1 T n {¥t 1 \sn) (1.4) 

for k = 1,2,... and reasonably accurate starting point 9n\ F n (6,s n ) and 
n n (0,t n ) are thus the discretized version of first and second derivative of the 
profile likelihood around 9 with step size s n and t n , respectively. Their forms 
are given and justified in section 3. In section 2, we provide some necessary 
background about semiparametric models and two primary assumptions needed 
in this paper. In section 3, we discuss the construction of the initial estimates 
and present the main result of the paper about higher order convergence rate 
of -fT-step semiparametric MLE. In section 4, the proposed K-step approach is 
applied to three semiparametric models. Section 5 contains some simulations 
results of the Cox regression model, and proofs are given in section 6. 

2. Background and Assumptions 

We assume the data Xi, . . . ,X n are i.i.d. throughout the paper. In what 
follows, we first briefly review the concept of the efficient score function and 
define the convergence rate for the nuisance functional parameter. Next, we 
present two primary assumptions about second order asymptotic expansions of 
log-profile likelihood and MLE. 
2.1 Preliminary 

The score function for 0, £g jV , is defined as the partial derivative w.r.t. 9 of 
the log-likelihood given rj is fixed for a single observation. We denote the true 
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values of (6, rj) as (0q>?7o)- A score function for r/o is of the form 

d 

-7^\t=o logpo ,r, t {x) = A eom h{x), 

where h is a "direction" by which rjt € TL approaches r/o, running through some 
index set H. Aq^ : H ^ L^{Pq v ) is the score operator for n. The efficient score 
function for 9 is defined as £q jV = — Hg >ri £g >ri , where Ho^ig^ minimizes the 
squared distance Pe^i^e,^ — k) 2 over all functions k in the closed linear space 
of the score functions for n (the "nuisance scores"). The inverse of the variance 
of Iq jT] is the Cramer Rao bound for estimating 9 in the presence of the infinite 
dimensional nuisance parameter n, called efficient information matrix Ig„. We 
also abbreviate £e 0:Vo an d Ie ,r] with 1$ and Iq, respectively. An insightful review 
of efficient score functions can be found in chapter 3 of Kosorok (2007). 

The maximum likelihood estimate for (9,rj) can be expressed as (# n ,r) n ), 
where f\ n = f/^ and f/g = argmax r] ^'nlik n (9,r]). The convergence rate for n is 
defined as the largest r that satisfies \\f)§ ~ VoW = Op(\\9 n — 9q\\ + n~ T ), where 
|| • || is a norm with definition depending on context, i.e., for a Euclidean vector 
u, \\u\\ is the Euclidean norm, and for an element of the nuisance parameter 
space rj E H, \\w\\ is some chosen norm on H. In regular semiparametric models, 
which we can define without loss of generality to be models where the entropy 
integral converges, r is always larger than 1/4. We say the nuisance parameter 
has parametric rate if r = 1/2. For instance, the nuisance parameters of the 
three examples in Cheng and Kosorok (2006) achieve the parametric rate. More 
specifically, the nuisance parameter in the Cox model, which is the cumulative 
hazard function, has the parametric rate under right censored data. However, 
the convergence rate for the cumulative hazard becomes slower, i.e. r = 1/3, 
under current status data. 
2.2 Assumptions 

The main result of this paper is based on the following second order asymp- 
totic expansion of the profile likelihood, i.e. (|2.1f) . For any random sequence 
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G n = #0 + °p(1)j Cheng and Kosorok (2007) proves that 

n 

logpUdn) = \og P i n (9 ) + (e n -e ) T J2^(Xi) 

-f (6»n - ^o) T /o(^n - e ) + o P (g r (\\e n - e n \\)), (2.1) 

where g r (w) = (nw 3 Vn 1_2r ffiVn^ r+1/2 )l{l/4 < r < 1/2} + (™ 3 Vn" 1/2 )l{r > 
1/2}, under certain second order no-bias conditions. Under similar conditions the 
maximum likelihood estimate is asymptotically normal, and has the asymptotic 
expansion: 

1 n 

where Jo is assumed to be strictly positive definite. Expansions (|2.ip and (|2.2p 
are essentially second order versions of (1.4) and (1.5), which justify using a semi- 
parametric profile likelihood as an ordinary likelihood, in Murphy and Van der 
Vaart (2000). Under second order conditions specified in section 2.3 of Cheng 
and Kosorok (2007), (|2.ip and (|2.2p have been shown to hold in several semi- 
parametric models, e.g. Cox regression and partly linear model, in Cheng and 
Kosorok (2006) and Cheng and Kosorok (2007). Therefore, we assume (12. ip and 
(|2.2p as two primary assumptions needed for the remainder of the paper. 

3. Main Results 

We first present two general approaches to searching for the preliminary 
estimates. And then we discuss how to construct the estimates for ¥ n £e,rj n and 
W9*^^„ m (|l-4p based on the profile likelihoods. Finally the convergence 
rate of K-step MLE is given. Such higher order convergence rate results are of 
interest particularly in small- or moderate-sized samples. The conditions (12. ip 
and (|2.2p are assumed to hold in this section. 
3.1 Initial Estimate 

The start-up estimator is usually required to have reasonably good precision 
in the above iT-step approach. In the parametric models, §n^ is required to be 
y/n consistent such that one- and two-step MLE can achieve the convergence 
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rate as shown in (jl .3[) . In our semiparametric model set-up, we need the initial 
estimate to be consistent for < ip < 1/2. The y/n consistent estimate in 
parametric models can be obtained through M-estimation theorem, i.e. theorem 
5.21 in Van der Vaart (1998), or derived case by case in different examples. In 
the semiparametric models where the ad-hoc estimation methods for §n^ are 
unavailable, we provide two general search strategies for 9 n ^: one is through 
some MCMC sampling procedure, called the profile sampler Lee, Kosorok and 
Fine (2005); another is through the deterministic or stochastic grid search over 
the profile likelihood function. 

The profile sampler is the MCMC sampling from the posterior of the profile 
likelihood, and was proposed for the purpose of obtaining frequentist inference 
of 9 Lee, Kosorok and Fine (2005). However, here we can use this convenient 
MCMC sampling procedure to yield y^-consistent 9 n °^ and consistent estimate 
for Iq. Specifically speaking, under the conditions (1.4), (1.5) in Murphy and 
Van der Vaart (2000) and mild conditions on the prior specified in theorem 1 of 
Lee, Kosorok and Fine (2005), Lee, Kosorok and Fine (2005) shows that 



where E g ^(6) and I n (PS) are the sample mean and the inverse of the sample 
variance of the profile sampler, respectively. 

Next, we provide an alternative grid search method to establish the consis- 
tent start-up estimator when the above profile sampling procedure is unavailable 
or time consuming. When the dimension of 9 is not large, we will conduct a de- 
terministic search of objective function Q n (9), which is defined as (logpl n (9)/n), 
at regularly spaced grid over the whole compact parameter space 0. We sum- 
marize this idea in the below theorem [TJ Meanwhile, we need to assume the 
asymptotic uniqueness of 9 n : 



In(PS) 



n + o P (n 
h + o P (l) 



(3.1) 
(3.2) 



Qn{0 n ) ~ Qn{6n) = o P {I) implies 9 n - G = Op(l) 



(3.3) 



for any random sequence {9 n } G 0. 



Theorem 1 Let T> n be a set of points 9^ regularly spaced throughout with 
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cardinality larger than cn d ^ for some c > 0. Suppose that the parameter space 
be a compact subset of~R d and ( Iff. 3)) holds, then we have for < tf) < 1/4 

0£-0o = OpM), (3.4) 
where 9® = argmaxj) n Q n (9) . 



However, if the dimension d is very large, we prefer the outcome of a stochastic 
search whose search points are formed by the realizations of an independent 
random variable 9 with strictly positive density around 9q. 

Corollary 1 Assume that is independent of Q n {9) for all 9 £ and 
admits a density having support and bounded away from zero in some neigh- 
borhood of 9q. Let S n be a set of independent copies of 9 with cardinality larger 
than cn 2 ^ for some c > 0. Suppose that the parameter space be a compact 
subset ofW 1 and \3. 3\) holds, then we have for < i/j < 1/4 

9 s n - 9 = P (n^), (3.5) 

where 9^ = argmaxs n Q n {9) ■ 



3.2 K-step MLE 

Before proceeding to give the convergence rate of if-step MLE, we first 
specify the forms of T n (9,s n ) and U(9,t n ) in (jl.4|) . The intuitive idea behind 
the constructions of the estimators for P n ^,rj n and ^n£e,f) n (-J fj n is to use fje as fj n 
when making inferences about 9. 

Specifically speaking, the ith component of T n (9,s n ) is constructed in the 
form: 



[T n (9,s n )]i = F n 



\oglik(9 + s n Vi,fj0 +SnVi ) - \oglik(9,f]e) 



log pl n (9 + s n Vi) - log pl n (9) 



ns r 



(3.6) 



where step size s n — > and Vi denotes the ith unit vector in IR^. Following similar 
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logic, we can define the (£, j)-th component of n n (0,i n ) as: 

rrr /fl . m log pl n (9 + V l t n + Vjtn) + log pl n {9) 
[ttn{V,t n )\ij - - 

logpl n (9 + Vjtn) + logplnjO + Vjt n ) 7 x 

P 

where step size t n —> 0. (|3.7p is also called observed profile information in Murphy 



and Van der Vaart (1999). The lemma 1 in the appendix justifies the use of 
and (|3.7p as consistent estimates of P n ^o an d Iq-, respectively. 

The convergence rate of K-step MLE is certainly determined by the order of 
the step sizes in numerical differentiations T n (-,s n ) and II n (-, t n ) as shown in the 
above. However, we are mostly interested in the fastest convergence rate K-step 
MLE can attain. Hence, we assume using the optimal step sizes (s n ,t n ), under 
which the fastest convergence rate of 9 n is achieved, in the below theorem [2J As 
the theoretical basis for using iT-step approach in practice, the below theorem [2] 
first presents the convergence rate for the fully iterative estimate 9 n °° , called 
optimal rate of K-step MLE, and then gives the number of iterations needed in 
(jl.4p for 9 n to attain the above optimal rate. Note that the asymptotic efficiency 

"(k) 

of 9 n has continuously improved through the whole iterative procedure until it 
reaches the optimal bound based on the proof of theorem [2j 

Theorem 2 Assume that 9 n k ^ is defined as jl-4\ ) an d 9 n °^ is ri^ -consistent 
for < ip < 1/2, we have 

§(P°) -§ n = P (n- 3 / 4 V n"^ 1 / 4 ). (3.8) 

Moreover, the above optimal rate can be achieved after N (M) iterations starting 
from 9 n 0) in (T^> for r > 1/2 (1/4 < r < 1/2): 

9 n N) -0 n = P (n" 3 / 4 ), (3.9) 
e ( n M) -e n = P {n- r - l l A ), (3.10) 

where N = int[log2'0/log(2/3)]+l, M = int[log(ip/r)/ log(2/3)]+mi[log(4r/(4r- 
l))/log(2) — 1] + 1 and int[x] indicates the smallest nonnegative integer larger 
than or equal to x. 
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Prom theorem [T] and [21 it is not surprising to find that there exists a tradeoff 
between the number of search grids and the number of iterations. Combining 
(|3.9p and (|3.10j) with (|2.2p , we have the following asymptotic expansion of if-step 
MLE: 

i n 

Thus we can construct the (1 — a)-th two sided asymptotically correct con- 

/"(k) I ~ ~(k) 

fidence interval for 9 based on K-step MLE, i.e. (9 n — ^i_ a /2/V nl,9 n + 
a / 2 / ynf), where k = M or N, z a is the standard normal a-th quantile, and 
/ is a consistent estimator of 1$. 

Remark 1 Recall that in the parametric models, Jassen, Jureckova and Ve- 
raverbeke (1985) shows that 6$ - 9 n = Op(n" 1 ) and 9 n 2) - n = Op(n~ 3 / 2 ). 
However, the optimal rate for the K-step MLE is slower even in the semipara- 
metric models with parametric convergence rate. Such efficiency loss can be par- 
tially explained by the less smoothness of the profile likelihood in semiparametric 
models. In other words, the corresponding estimators for the score function and 
information matrix in K-step parametric MLE usually have bias of smaller order. 

4. Examples 

In this section, the above K-step estimation approach is illustrated with 
three semiparametric models of different convergence rates. Under the model 
assumptions specified in section 5 of Cheng and Kosorok (2007), Cheng and 
Kosorok (2007) shows that (|2.ip and (|2.2p hold in all the examples. Hence, we 
only briefly review the model set-up here, and then discuss the choices of the 
initial estimates. Finally, we apply the theorem [2] to figure out the least number 
of iterations in i^-step MLE needed to achieve the full efficiency. 
4.1 Cox regression with right censored data 

In the Cox regression model, the hazard function of the survival time T of a 
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subject with covariate Z is expressed as: 

X(t\z) = Urn ^Pr(t <T <t + A\T >t,Z = z) = X(t) exp(9z), (4.1) 

where A is an unspecified baseline hazard function and 9 is a vector including 
the regression parameters Cox (1972). Under right censoring data, we only know 
that the event time T has occurred either before the censoring time C, or after 
the censoring time C. More precisely, the data observed is X = (Y,S,Z), where 
Y = T A C, 5 = I{T < C}, and Z £ Z C R is a regression covariate. In the Cox 
regression model, we are usually interested in the regression parameter 9 while 
treating the cumulative hazard function rj as the nuisance parameter. Thus we 
express the likelihood for (9, rj) in the below form: 

lik(0,<q) = (e ez r,{y}e- e<>z ^y)^ (e-^v(v)V' 5 ^ ( 42 ) 

by replacing hazard function A(y) by the point mass rj{y}. By the special con- 
struction of the Cox model, we have an explicit form of the log-profile likelihood: 

I 

\ogpl n {9) = J2(9z [{l - log e ez i), (4.3) 
i=i jeRi 

where R{ = {j : Yj > ti}, ti is the observed value of the i-th ordered event 
time and zu\ is the covariate corresponding to ti. The convergence rate of the 
estimated nuisance parameter is established in theorem 3.1 of Murphy and Van 
der Vaart (1999): 

\\r)§ n ~ mWoo = P (n-^ + \\§ n - fo||), (4.4) 

where || • ||oo denotes the uniform norm. 

In this model, the profile sampler is generated very fast because of the explicit 
form for the profile likelihood. Hence, we use it to yield the root-n consistent 
start-up estimator. By theorem [21 we can conclude that 9n^ — 9 n = Op(n" 3 / 4 ), 
where 9n^ is constructed according to (|1.4p . 
4.2 Cox regression for current status data 

Current status data arises when each subject is observed at a single exam- 
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ination time, Y, to determine if an event has occurred. The event time, T, 
cannot be known exactly. Then the observed data are n i.i.d. realizations of 
X = (Y, 5, Z) G R+ x {0, 1} x R, where 5 = I{T < Y} and Z is a vector of 
covariates. It is not difficult to derive the log-likelihood: 

n 

\oglik n (6,r,) = ^log[l - exp(-77(y i )exp(eZ < ))] - (1 - fc) exp(^)r ? (y i )(4.5) 

i=l 

Moreover, using entropy methods, Murphy and Van der Vaart (1999) extends 
earlier results of Huang (1996), show that 

\\% n ~ mh 2 = Op(\\9 n - 6 Q \\ + n- 1 / 3 ), (4.6) 

where || • \\l 2 is the L2 norm w.r.t. the distribution of Y. 

In the Cox regression with current status data, the iterative convex mino- 
rant algorithm Huang (1996) is implemented to yield the profile likelihood. The 
MCMC sampling procedure thus becomes more time consuming because of such 
iterative computation mechanism. Hence, we prefer using grid search approach 
to obtain n 1//4 -consistent preliminary estimate. We know that three step MLE 
attains the optimal rate, i.e. 9n^ — n = Op(n _7//12 ), based on theorem [2j 
4.3 The partly linear model 

In this model, a continuous outcome Y, conditional on the covariates (W, Z) £ 
R d x M, is modeled as: 

Y = 9 T W + k(Z) + f, (4.7) 

where k is an unknown smooth function, and ^ ~ A^(0, 1). The functional 
nuisance parameter k is assumed to belong to O2 = {/ : J%(f) + ||/||oo < 
M, for a known M < 00}, where J2H) is the second order Sobolev norm of 
/. However, the response Y is not observed directly, but only its current sta- 
tus is observed at a random censoring time C € R. In other words, we observe 
X = (C,A,W,Z), where A = l\y<c}- Additionally (Y,C) is assumed to be 
independent given (W,Z). Under the model (|4.7p . the log- likelihood for a single 
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observation at X = x = (c, 6, w, z) can be shown to have the form: 

loglik e , k (x) = 5 log {$ (c-9w- k(z))} + (1 - S) log {l-Q>(c-9w- k(z))} (4.8) 

where $ is the standard normal distribution. In lemma 4 of Cheng and Kosorok 
(2007), we have shown 



fcr 



= O P (n^ 5 + \\9 n -9 \\). (4.9) 

L2 



The rate r = 2/5 is clearly faster than the cubic rate but slower than the para- 
metric rate. Depending on the dimension of 9, we can choose the deterministic 
or random search for the starting estimate. Similarly, we can show that four 
iterations are needed to achieve the optimal rate, i.e. 9^ — 9 n = P (n- 13 / 20 ), if 
9n^ is n 1//4 -consistent. 
5. Simulations 

It is of interest to see, at a finite sample, how good the i^-step MLE is in 
comparison with the regular MLE. Hence, we conducted simulations in the Cox 
regression model with right censored data and with current status data in this 
section. The simulation results presented in the table 1 and 2 agree with our 
theoretical results given in subsection and . 

We first run the simulations for various sample sizes in the Cox model with 
right censored data. As indicated in subsection , we can construct 0jP in the 
form of (jl.4p with (s* , t* ) set to be proportional to (n _3//4 , n~ 1 / 2 ) according to the 
proof of theorem [2j The profile sampler is generated under a Lebesgue prior. For 
each sample size, 500 datasets were analyzed. The event times were generated 
from (|4.ip with one covariate Z ~ U[0, 1]. The regression coefficient is 9 = 1 
and r] (t) = exp(t) — 1. The censoring time C ~ C/[0,t n ], where t n was chosen 
such that the average effective sample size over 500 samples is approximately 
0.9n. For each dataset, Markov chains of length 5,000 with a burn-in period of 
1,000 were generated using the Metropolis algorithm. The jumping density for 
the coefficient was normal with current iteration and variance tuned to yield an 
acceptance rate of 20% — 40%. In the Cox regression with current status data, 
we first use the deterministic search over [—5,5] for the n 1//4 consistent On . The 
three step MLE is iteratively generated according to (II. 41) . in which the order of 
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(s*,i*) at each step is specified in the proof of theorem [2j 

In the appendix, the table 1 (2) summarizes the results from the simulations 
of Cox regression with right censored data (current status data) giving the av- 
erage across 500 samples of K-step MLE and the maximum likelihood estimate 
(MLE). According to theorem El n 3 / 4 |(9 n - (n 7 / 12 \9 n - §^\) in Cox model 
with right censored data (current status data) is bounded in probability. And 
the realizations of these terms summarized in table 1 and 2 clearly illustrate their 
boundedness. For each sample size, we can clearly observe that -fT-step MLE ap- 
proaches to 9 n after every iteration. Hence we can conclude that the numerical 
evidence in this section supports our theoretical results. 

Acknowledgment The author thank Dr. Michael Kosorok for several insight- 
ful discussions. 

6. Appendix 

In the below lemma 1, we first provide a key technical tool for deriving higher 
order convergence rate of K-step MLE. The symbol R n ^ q n means that some 
random quantity R n = Op{q n ) and R^ 1 = Op(q~ 1 ), where q n — > 0. 

Lemma 1. Assume the conditions (12. lj) and (|2.2j) and 9 n °^ is a n^-consistent 
estimate for < ip < 1/2, then we have 




(6.1) 



r 



n 



(0l O) 



r 



n 



Sn) ~ IqU, 



n 




(6.2) 



(6.3) 



where U n = Op(n s ) for some s > 0, (6 n — 6 n ) = Op(r n ) and g r (w) 
n l - 2r w V n- 2r+1 /2)l{l/4 < r < 1/2} + (w 3 V rT 1 / 2 )!^ > 1/2}. 
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Proof of lemma 1 : (|2.ip implies that 

n 

logpl n 0n^ + Vn + S n Vi) = logp/ n (#o) + (Pn^ + + ^n^i — #0 

j=l 

-^(^ 0) + K + 8n«i " 0o) T Io0 { n ] + Vfi + S n Vi ~ ) 
+0 P (g r (n-^V\s n \V\\V n \\)), 

n 

logpl n 0W + V n ) = log P l n (e ) + (6W +V n -9 ) T ^Io(Xi) 

i=i 

~0^+v n -e o ) T i o (e^ + v n -e o ) 



-0 P (g r (n-*> V \\V n \\)), 



p 

for any random vector V n = op(l) and s n —>■ 0. Combining the above two 
expansions and (|3.6p . we have 



r„(%°> + v„, ^n) = p„4> - W - * ) - W + Op (\ Sn \ v gr(n ^ v| , s "| v||yn| 



By replacing V n = and 14 = J7 n in the above equation, we have proved (|6.ip 
and (|6.2p . respectively. Taking into account (|2.1|) and (|2.2|) . we can prove the 
below second order asymptotic expansion of the profile likelihood around n ~. 

logpl n (0 n ) = \ogpl n (6 n ) - l -n{Q n - e n ) T Wn - e n ) + P {g r (\\6 n - n ||)) (6.4) 

for any sequence 9 n = 6 n + op(l). Following similar analysis in the above, (|3.7|) 
and (|6.4p yield ()6.3|) . This completes the whole proof. □ 

Proof of theorem^ (|2.ip implies that for 8 n — 9q = op(l) 

Qn(e n ) = Qn(e ) + (9 n -6 ) T Fj -^(8 n -9o) T Io(6n-e ) + A n ,(6.5) 

where A n = O P {g r (\\0 n -9 n \\))/n. We then show that PQ\6%-d \\ > CrT*) ->■ 
by the below set of inequalities for some C > 0. Set A/" n = {0 : \\9 — 9q\\ < Cn~^} 
and N£ denotes its complement. Note that V n C\N n ^ for C large enough and 
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V n n Mn 7^ for n large enough. 



Pffi e AC) < P ( max Q n {B) < max Q n (0) 

< P ( max Q n (0) < Q„(0q) - Cm^ 



+PH max Q nW < maxQnW^n 



max Q n (0) > Q n (# ) - Cm" 2 * 



< P ( max V^(Q n (0) - Q n (0 o )) < -Cm 1 ^ 

+P f max yfii{Q n {6) ~ QniOo)) > -Cm 1 / 2 " 2 ^ , 

where Ci is some positive number. The first inequality in the above follows from 
the definition of Based on (16.51) we have 



P f max V^(Qn(0) - Q„(0 O )) < -Cm 1 / 2 " 2 ^ 

= P (max (MO ~ Oof^Jo - ^(9 - 9 fl (9 - 9 ) + V^&n) < -Cm 1 / 2 ^) 
<p( max (9 - 9o)(-V^J ) + max ({V^/2)(9 - 9 ) T I {9 - 9 ))+ 

max (V^A n ) > Cin 1 / 2 " 2 ^^ 

PnrWn 7 

< P (V^P„1 £ (Ci - 8C 2 /2)n l l 2 ~ 2 ^ + Op{n 1 / 2 -^)) , 

where 5 is the largest eigenvalue for Jo. The last inequality in the above follows 
from the compactness of 0. Let 6>* = argmaxx>„nM c VnQn{9)- fl3.3|> implies that 
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0* - 6> = op(l) since Q n 0n) - Qn(8o) = op(l). Thus, by (|6.5p . we have 



P ( max v^(Qn(^) - Q«(0o)) > -Cin 1 / 2 " 2 ^ 



< P (ytiFj > (SKf/2 - C x )n x l 2 -^ + Opin 1 / 2 - 3 ^ 



> _ Cin i/2-2^ 



Note that 0* belongs to the regularly spaced grid set V n and ||#* — 6o\\ > Cn V. 
Therefore, we can conclude that d* n should be the closest grid point to 6>o but not 
in M n , i.e. Kyn'^ < \\6* - \\ < K 2 n~^, where C < K\ < K 2 < 2C for large 
C, from (16.51) and the construction of T> n . Without loss of generality, we assume 
#* > #o- Thus the last inequality in the above follows. Note that ^/nP n £o = 
Op(l) and ip < 1/4. By choosing sufficiently large C and C\, meanwhile keeping 
the inequality 5C 2 /2 < C\ < 5K 2 /2 hold, we can P{0® e N°) -> based on the 
above inequalities. □ 

Proof of corollary [7J- The proof is similar to that of theorem [TJ We still need 



to show that P{ 



> Cn ^) — ► for some C > 0. Similarly, we have 



P(^eAA n c ) < E 



P max Q n (0) < max Q n (9)\S r 
1 <s„rW„ 5 n rW„ e 



< E 



P ( max y/n(Q n (6) - Q n (0 o )) < -Cm 1 / 2 " 2 ^^ 



+P sup Vn~(Q n (9) - Q n (0 )) > -Cm 1 ' 2 ^ 



Ms 



< P (ynPja £ (Ci/2)n 1 / 2 ^ 2 ^ + P (n 1/2 ~ 3 ^ 
+P I sup Vn~(Qn(0) - QnVo)) > -Cm 1 / 2 " 2 ^ 

\ m j 



+E 



P [ min ((v^/2)(fl - e o ) T I o (0 - do)) > (C 1 /2)n 1 / 2 - 2 ^|5 r 

SnClNn 



The first two quantity in the last inequality approaches to zero by choosing 
proper C\ and C according to similar analysis in the proof of theorem [TJ We 
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next analyze the last quantity. 



E 

< E 



P I min ((V^/2)(0 - e o ) T I o (0 - e )) > (C 1 /2)n 1 / 2 - 2 ^|5 n 
s„nN„ 



P min((0 - e Q ) T I (9 - )) > Cm^\S n 

< [1 - P (\\e - e \\ 2 < c lC /card{s n ))] card{Sn) 

< (1 - P C l /card{S n )) card ^ -> exp(-pC 1 ), 



where p > 0. The second inequality follows since the cardinality of S n is larger 
than cn 2 ^. The last inequality follows from the boundedness of the density of 9 
around 9q. This completes the proof of corollary [TJ □ 

Proof of theorem® We first prove the below lemma [2jl. 

lemma® 1. Assuming the conditions in theorem [2] and that 



n, 



,t n ) -Io = P {r, 



(fc-lK 
n h 



(6.6) 



we have 
(0<W -0 n ) = Op 1 " <' ,/ '- 1) 



9 \\ r (k-l) w ffrdanl V" 
7 n\\l n v 



-1/2 v _ 5 



V|s„|) (6.7) 



for fc = 1,2,.... 

Proof: Based on (|1.4p . we have 

n n {e^- x \t n )^i{e^ - e n ) = [v^nj^- 1 ),^)^- 1 ) - + ^r n (0 n ,o 



+ 



^{Y n {9^- l \s n )-T n {e n ,s n )) 



(6i 



The second term in the above equation equal to 

9r{\s n \) 



Op \/n\s n \ V 



Vn\s n \ 
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according to (|3.6|) and (|6.4p . The third term in (|6.8p can be written as 

grin- 1 ' 2 y \ Sn \y \\e^- 1] - e n \\) 



V^Wn^ ~6n) + Op V^|s n |V 



\/™l s n| 



for k = 1,2,... by replacing §n with and U n with ^ — 9 n in (j6.2|) . 



Combining the above analysis, the assumption (16.6|) and nonsingularity of To, we 
complete the proof of (|6.7p . □ 

We next start the proof of (|3.8p - (|3.10p . Combining (|6.3[) with (|6.7p . we can 
obtain that 



a( k - 1 ) a IIMIfl^- 1 ) 



gr(\Sn\Vn-^V\\e { t l) 



V «n V 

n|s n | 

= Op(/fc-l(|tn|)Vpfc_i(|*„|)). 

Considering the form of g r (-) specified in lemma 1, we have gk-i(\s n \) > ff(l s n|) = 
(|s| V n -2r-1 / 2 |s n |~ 1 V n _3 / 2 |s n | _1 ). The smallest convergence rate for <7(|s n |) is 
n — 3/4^ n -r-i/4-j j£ we c h oose Sn x n -3 / 4 (s n x n~ r-1 / 4 ). The above analysis 
implies (|3.8D , 

In the below proof of (|3.9p and (|3.10p . we consider different cases when 
r > 1/2 and 1/4 < r < 1/2, respectively. For r > 1/2, by some algebra we can 
show that for k > 1 

0«-4 = O P (||^-i)-4f/2) (6 .9) 

when ll^*" 1 ) - = Op(v^), < - ll^n _1) - e n \\ 3/2 and t* n x ||^ fc_1) - 

And when \\9^ ^ — 9 n \\ = Op(n -1 / 2 ), \\§n^ — 9 n \\ achieves the optimal rate 
Op(n -3//4 ) given s* x n _3//4 and i* x n -1 / 2 . Thus we only need to figure out 
how many iterative steps needed for fc-step MLE to achieve root-n rate. From 
(I6.9p . we know that the convergence rate for iVi-step MLE will be Op(n -1 / 2 ), 
where N\ = mt[log2i/)/log(2/3)], given §n^ is n^-consistent. This concludes the 
proof for r > 1/2. 
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We next show ()3.10j) when 1/4 < r < 1/2. Similarly we have for k > 1 

-e n = o P {\\9 { t 1] - kf 12 ) (6.10) 

if - OnW' 1 = P (rf), < x ||4 fe_1) - 4|| 3/2 and t* n x ||^ fc_1) - 0„|| • 

However if \\9n~^ — 6 n \\~ l = Op(n 1 / 2 ) and ||0n — #n|| = Op(ra~ r ), we have 
for k > 1 

- n = OpOI^- 1 ) - 4|| 1/2 n- r ) (6.11) 

given s* x ||#i fc ^ — 9 n \\ 1 ^ 2 n~ r and i* x n~ r . We next consider two-stage 
iterations for K-step MLE. If 9n^ is n^-consistent for tp < r, then at least 
Mi iterations are needed such that ^ — 9 n || = P (n- r ) based on ([BTTO]) . 
where Mi = int[log(ip/r)/ log(2/3)]. When ET-step MLE has achieved the n r - 
consistency, we further need M<i steps to achieve the root-n rate, where Mi = 
mt[log(4r/(4r — l))/log(2) — 1], from (|6.1ip . Then we complete the whole proof 
for theorem [2 □ 
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Table 1. Cox regression with right censored data(0Q 


= 1 and 500 samples). 


n §<?> 


0(1) 


fn 


n 3 / 4 \9 n 


— fn \ 




50 1.0229 


1.0222 


1.0167 


0.1030 




100 1.0346 


1.0344 


1.0324 


0.0632 




200 0.9979 


0.9979 


1.0028 


0.2606 




500 0.9974 


0.9974 


0.9964 


0.1057 




Table 2. Cox 


regression 


with current status data (0$ 


= 1 and 500 samples). 


n 0™ 


Cm 


fl(2) 
fn 


,3(3) 
fn 


fn 


n 7 / 12 \§n-fn 3) \ 


50 1.0452 


1.8218 


1.7226 


1.7563 


1.1962 


5.4870 


100 0.8017 


0.7604 


0.7997 


0.8289 


0.8541 


0.3699 


200 0.8118 


0.7692 


0.8474 


0.8425 


0.8859 


0.9545 


500 0.8376 


0.8364 


0.8757 


0.9592 


0.9896 


1.1410 



n, sample size; 6„ \ initial estimate; 9n \ one step MLE; §„ \ two step MLE; §n \ three 
step MLE; § n , MLE. 



